The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
translated by 谷歌翻译
文本生成模型已成为许多研究任务,尤其是句子语料库的生成焦点。但是,了解自动生成的文本语料库的属性仍然具有挑战性。我们建议一组检查生成文本语料库的属性的工具。将这些工具应用于各种生成的语料库中,使我们能够对生成模型的属性获得新的见解。作为我们特征过程的一部分,我们发现了两种主要生成技术产生的语料库存在显着差异。
translated by 谷歌翻译